Matrix Multiplication on High-Density Multi-GPU Architectures: Theoretical and Experimental Investigations

نویسندگان

Peng Zhang

Yu-Xiang Gao

چکیده

Matrix multiplication (MM) is one of the core problems in the high performance computing domain and its efficiency impacts performances of almost all matrix problems. The high-density multi-GPU architecture escalates the complexities of such classical problem, though it greatly exceeds the capacities of previous homogeneous multicore architectures. In order to fully exploit the potential of such multi-accelerator architectures for multiplying matrices, we systematically evaluate the performances of two prevailing tilebased MM algorithms, standard and Strassen. We use a high-density multi-GPU server, CS-Storm which can support up to eight NVIDIA GPU cards and we test three generations of GPU cards which are K20Xm, K40m and K80. Our results show that (1) Strassen is often faster than standard method on multicore architecture but it is not beneficial for small enough matrices. (2) Strassen is more efficient than standard algorithm on low-density GPU solutions but it quickly loses its superior on high-density GPU solutions. This is a result of more additions needed in Strassen than in standard algorithm. Experimental results indicate that: though Strassen needs less arithmetic operations than standard algorithm, the heterogeneity of computing resources is a key factor of determining the best-practice algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse-matrix vector multiplication on hybrid CPU+GPU platform

Sparse-matrix vector multiplication(Spmv) is a basic operation in many linear algebra kernels.So it is interesting to have a spmv on modern architectures like GPU. As it is a irregular computation CPU also performs compares to GPU. So it is interesting to have this routine in hybrid architectures like CPU+GPU.So we have designed a hybrid algorithm for Spmv which uses a CPU and a GPU. We have ex...

متن کامل

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

New multi-core CPU and GPU architectures promise high computational power at a low cost if suitable computational algorithms can be developed. However, parallel programming for such architectures is usually non-portable, low-level and error-prone. To make the computational power of new multi-core architectures more easily available to Modelica modelers, we have developed the ParModelica algorit...

متن کامل

Toward optimised skeletons for heterogeneous parallel architecture with performance cost model

High performance architectures are increasingly heterogeneous with shared and distributed memory components, and accelerators like GPUs. Programming such architectures is complicated and performance portability is a major issue as the architectures evolve. This thesis explores the potential for algorithmic skeletons integrating a dynamically parametrised static cost model, to deliver portable p...

متن کامل

Co-processing SPMD Computation on GPUs and CPUs on Shared Memory System

Heterogeneous parallel system with multi processors and accelerators are becoming ubiquitous due to better cost-performance and energy-efficiency. These heterogeneous processor architectures have different instruction sets and are optimized for either task-latency or throughput purposes. Challenges occur in regard to programmability and performance when executing SPMD computations on heterogene...

متن کامل

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

High-performance computing systems today include a variety of compute devices such as multi-core CPUs, GPUs and many-core accelerators. OpenCL allows programming different types of compute devices using a single API and kernel language. However, there is no standard matrix operations library in OpenCL for operations such as matrix multiplication that works well on a variety of hardware from mul...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Matrix Multiplication on High-Density Multi-GPU Architectures: Theoretical and Experimental Investigations

نویسندگان

چکیده

منابع مشابه

Sparse-matrix vector multiplication on hybrid CPU+GPU platform

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

Toward optimised skeletons for heterogeneous parallel architecture with performance cost model

Co-processing SPMD Computation on GPUs and CPUs on Shared Memory System

A portable and high-performance matrix operations library for CPUs, GPUs and beyond

عنوان ژورنال:

اشتراک گذاری